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Abstract 


Physical  and  behavioral  methods  were  employed  to  determine  the 
characteristics  of  a  binaural  microphone  array  and  its  ability  to  support  the 
localization  of  sources  of  sound.  The  gain  pattern  and  output  delay  of  the 
array  were  measured  and  compared  to  those  of  an  acoustic  mannequin. 
Three  behavioral  methods  of  assessing  the  adequacy  of  binaural 
information  provided  by  the  array  were  employed.  These  include  (a) 
measuring  the  minimum  audible  angle,  (b)  defining  the  limit  of  image 
lateralization,  and  (c)  determining  the  accuracy  with  which  listeners  could 
point  the  array  at  an  acoustic  source.  Insights  about  transformations 
produced  by  the  array  were  provided  by  a  modeling  effort.  The  data 
indicate  that  the  present  design  would  provide  reliable  information  when  it 
is  used  as  a  dynamic  pointing  device. 
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ASSESSMENT  OF  A  BINAURAL  MICROPHONE  ARRAY 


INTRODUCTION 

In  March  1 995,  the  Auditory  Research  Team  of  the  Human  Engineering  and  Research 
Directorate  of  the  U.S.  Army  Research  Laboratory  (ARL)  was  asked  to  evaluate  the  long 
distance  listening  device  (LDLD)  developed  by  ARL’s  Sensors  Directorate  and  to  provide 
information  about  the  ability  of  this  array  to  enhance  a  person’s  ability  to  localize  sources  of 
sound.  This  listening  device  (Scanlon  &  Tenney,  1994)  consists  of  two  end-fire  linear  arrays,  one 
for  each  ear.  Each  linear  array  has  nine  cardioid  electret  microphones  spaced  about  4.2  cm  apart 
and  oriented  so  that  the  area  of  greatest  sensitivity  of  each  microphone  was  directed  forward 
along  the  long  axis  of  the  array.  Delays  are  added  to  the  signals  arriving  from  individual 
microphones  so  that  as  acoustic  energy  travels  down  the  array,  the  microphone  outputs  are 
summed  approximately  in  phase.  The  result  is  a  unidirectional  pattern  of  sensitivity  for  the 
array  with  considerable  gain  for  signals  arriving  along  the  axis  of  the  array. 

The  intention  of  the  ARL  Sensors  Directorate  is  to  develop  a  listening  device  that  can  be 
carried  in  the  field  mounted  on  a  rifle  or  that  could  be  adapted  to  mounting  at  a  fixed  location. 
Optimizing  the  detection  of  sounds  is  the  paramount  goal,  but  once  a  monaural  channel  is 
defined,  it  is  simple  enough  to  provide  a  binaural  capability.  In  this  case,  the  amplified  outputs 
of  the  left  and  right  arms  of  the  array  were  led  separately  to  the  left  and  right  ears  of  the  listener. 
Thus  a  binaural  array  has  the  potential  for  enhancing  soldiers’  ability  to  localize  sources  of 
sound.  The  localization  of  sounds  detected  through  a  binaural  array  represents  a  new  and 
important  capability  for  soldiers,  so  our  focus  has  been  the  potential  for  spatial  selectivity 
provided  by  a  binaural  listening  device.  In  particular,  we  sought  information  that  might  constrain 
the  design  of  a  digital  version  of  the  long  distance  array.  For  ease  of  discussion,  the  Army’s 
LDLD  will  be  referred  to  as  a  binaural  array.  The  reader  should  keep  in  mind,  however,  that  we 
are  referring  to  a  specific  device  with  its  particular  geometrical  arrangement  of  directional 
microphones  and  its  specific  signal  processing  and  amplification. 

The  pattern  of  gain  of  a  cardioid  microphone  is  given  as  1  +  cos  0  in  which  6  is  the  off- 
axis  angle  of  the  source.  Each  element  of  the  array  provides  a  gain  of  2  for  signals  arriving  from 
sources  directly  ahead  (in  which  6  =  0)  and  all  the  microphones  were  moimted  similarly— with 
their  preferred  orientation  aligned  with  the  long  axis  of  the  arm.  By  adjusting  the  gain  and  phase 
of  the  individual  microphone  responses  across  frequency,  the  “beam”  of  sensitivity  of  the  array 
can  be  shaped  further  and  the  final  output  of  the  array  can  be  amplified  to  increase  auditory 
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detection  for  sources  located  within  the  beam.  Unaided  binaural  listening  can  increase  detection 
thresholds  by  about  3  dB  relative  to  monaural  listening,  and  this  binaural  advantage  can  grow  to 
as  much  as  12  dB  as  signal  levels  increase  above  threshold  (Reynolds  &  Stevens,  1960),  an 
amount  important  for  the  recognition  of  sound  sources.  A  typical  value  for  the  binaural  loudness 
summation  of  speech  signals  (between  60  and  70  dB)  is  about  6  dB.  This  advantage  is  in  addition 
to  the  almost  25  dB  of  directional  gain  provided  by  each  arm  of  the  array. 

However,  the  outputs  of  the  left  and  right  arms  of  the  array  are  led  independently  to  the 
left  and  right  ears  of  the  listener,  making  listening  with  the  array  similar  to  listening  with 
stereophonic  headphones.  Under  such  an  arrangement,  sound  images  are  internalized  and  the 
spatial  auditory  world  is  represented  as  lateral  positions  along  an  interaural  axis.  This  may  be  a 
limitation  of  the  LDLD.  Cues  used  by  listeners  to  localize  sounds  in  a  free  field  move  the  sound 
image  left  and  right  along  this  interaural  axis.  When  headphone  presentation  has  been  used  to 
study  mechanisms  for  sound  localization,  the  term  “lateralization”  is  used  to  describe  the 
listening  task.  This  is  because  the  interaural  differences  that  are  the  cues  for  the  localization  of 
ffee-field  sounds  cause  internalized  sound  images  to  move  away  from  a  centered  position  to  more 
lateral  ones.  In  the  LDLD,  the  left  and  right  arms  of  the  array  were  connected  and  hinged  at  one 
end  so  that  its  beams  could  be  separated,  thus  allowing  a  variable  area  to  be  scanned.  This 
arrangement  creates  interaural  differences  of  arrival  time  with  the  arm  closest  to  the  source 
receiving  the  signal  first,  just  as  for  normal  binaural  hearing. 

The  ear  is  quite  sensitive  to  interaural  time  differences  carried  by  low  frequency  tones;  for 
instance,  thresholds  for  one  ear  leading  the  other  can  be  as  low  as  10  to  15  microseconds  (Hafter 
&  DeMaio,  1975).  In  addition,  there  is  the  belief  that  for  wide-band  signals,  low  frequency  time 
differences  are  the  dominant  cue  for  source  azimuth  (Wightman  &  Kistler,  1992).  The  utility  of 
ongoing  interaural  time  differences,  however,  is  limited  by  the  size  of  the  head.  The  average  male 
head  is  7.25  inches  in  diameter  which  can  cause  approximately  600  to  660  microseeonds  of 
interaural  delay.  Above  1200  to  1500  Hz,  the  interaural  phase  difference  attributable  to 
interaural  travel  time  can  reach  and  exceed  1 80  and  thus  make  this  eue  ambiguous.  In  fact,  it  is  in 
this  frequency  range  where  human  sensitivity  to  interaural  timing  relations  within  the  fine 
structure  of  signals  disappears. 

In  addition  to  differences  of  interaural  time,  interaural  differences  of  level  are  used  for 
localization  as  the  ear  farthest  from  a  source  of  sound  has  its  signal  attenuated  (shadowed)  by  the 
head.  These  differences  of  level  arise  from  the  diffraction  and  reflection  of  acoustic  energy  as  it 
eneounters  the  head,  and  they  vary  with  the  frequeney  of  the  signal.  For  a  human  head,  there  are 
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almost  no  differences  of  level  at  low  frequencies,  but  for  frequencies  above  1200  to  1500  Hz, 
interaural  level  differences  can  grow  to  as  much  as  20  to  25  dB.  Shaw  and  Vaillancourt  (1985) 
have  summarized  measurements  of  interaural  differences  of  level  made  on  a  large  number  of 
human  subjects.  Interaural  differences  of  level  are  also  created  by  the  binaural  array  when  the 
arms  are  pointed  at  different  azimuths  because  the  sensitivity  contour  of  each  arm  is  highly 
directional.  Only  when  the  angular  separation  of  the  arms  is  small  or  when  a  source  is  located 
along  the  axis  bisecting  the  array  will  the  output  of  each  arm  be  similar.  In  addition,  there  may  be 
interactions  between  the  arms  that  make  the  operation  of  the  array  different  from  human  hearing- 
such  as  one  arm  narrowly  shadowing  the  other  or  reflecting  energy  to  it. 

In  essence,  we  would  expect  listening  through  the  binaural  array  to  be  similar  to  human 
binaural  hearing  in  that  interaural  time  differences  would  be  minimal  (or  nonexistent)  for  sources 
along  the  midline  and  would  increase  to  a  maximum  for  sources  at  90°  or  270°.  Interaural 
differences  of  level  would  be  minimal  when  a  source  is  directly  ahead  of  the  array,  but  for  the 
array,  interaural  differences  of  level  do  not  depend  mainly  upon  shadowing  from  the  head  but 
upon  the  shape  of  the  beam  pattern  of  each  arm  of  the  array  and  its  relation  to  the  source  of 
sound.  We  will  see  later  that  interaural  differences  of  level  provided  by  the  array  may  be  larger 
than  those  normally  generated  by  a  human  head,  and  they  may  grow  faster  for  the  array  than  for 
the  human  head  as  the  source  of  sound  moves  away  from  the  midline.  However  they  arise,  the 
array  presents  different  values  of  interaural  time  and  level  than  those  created  by  the  bare  head, 
which  may  lead  to  less  than  optimum  sound  localization.  Unfortunately,  there  are  very  few  data 
concerning  human  adaptation  to  distorted  or  novel  patterns  of  localization  cues.  The  recent 
exceptions  are  Barbara  Shinn-Cunningham’s  (1994)  dissertation  and  the  paper  by  Shinn- 
Cunningham,  Durlach,  and  Held  (1997).  The  latter  is  the  most  recent  of  a  series  of  articles 
(Durlach  &  Pang,  1986;  Van  Veen  &  Jenison,  1991;  Durlach,  1991)  where  techniques  to  provide 
supra-normal  interaural  cues  were  examined.  Shinn-Cunningham  (1994)  changed  the  relation 
associating  head-related  transfer  functions  to  the  azimuth  they  represent  in  a  way  that  expanded 
auditory  space  near  the  midline  and  compressed  it  laterally.  This  arrangement  distorted 
localization  judgments  unless  listeners  were  trained  (with  feedback)  to  adapt  to  the  distortion. 
When  the  distortion  was  removed,  listeners  required  time  (and  practice)  to  adapt  back  to  the 
normal  arrangement  of  apparent  and  real  source  locations.  The  conclusion  is  that  listeners  can 
accommodate  to  re-arranged  cues  and  can  learn  to  switch  from  one  set  of  learned  relations  to 
another. 

As  a  result  of  these  considerations,  an  important  design  question  concerns  the  desirable 
angular  separation  of  the  axes  of  the  array  and  a  possible  trading  of  detection  and  localization 
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performance.  If  the  beams  are  superimposed,  can  signals  arriving  over  the  array  be  adequately 
localized  or  is  the  3-  to  6-dB  gain  of  signal  level  the  only  advantage  of  the  array?  Can  an  angular 
separation  be  determined  that  will  maximize  both  the  detection  and  localization  of  the  sources  of 
sounds  heard  through  the  array?  Informal  listening  with  the  device  indicated  that  when  the  array 
was  rotated  in  the  horizontal  plane  so  that  the  left  and  right  beams  sequentially  moved  past  a 
source  of  sound,  the  internalized  sound  image  appeared  to  move  faster  than  the  array  was  being 
swept  past  the  source.  This  would  indicate  a  horizontal  ‘‘magnification  of  the  acoustic  world 
within  the  beams  of  the  array. 

To  provide  some  insight  about  the  localization  of  sounds  detected  by  listening  through  a 
binaural  array,  we  decided  to  make  two  sets  of  observations:  (1)  physical  measurements  of  the 
action  of  the  array,  and  (2)  behavioral  measurements  of  listening  aided  by  the  array.  Plots  of  the 
patterns  of  gain  for  both  the  array  and  for  a  human  head  and  torso  would  facilitate  comparison  of 
localization  aided  by  the  array  to  unaided  localization.  These  measurements  would  reflect  the 
physical  constraints  of  the  current  design  of  the  LDLD  and  of  its  human  users.  There  is  little  in 
the  literature  of  binaural  listening  that  can  provide  guidance  for  the  design  of  devices  that  impose 
their  own  pattern  of  interaural  time  and  intensity  differences  onto  signals  that  listeners  then  have 
to  interpret  with  their  constant  everyday  experience.  This  is  the  reason  we  have  attempted  to 
characterize,  at  least  in  a  general  way,  the  action  of  the  binaural  array  and  the  acoustic 
information  it  makes  available  for  sound  localization. 

In  addition,  measurements  of  listeners’  spatial  resolution  while  listening  through  the  array 
would  indicate  the  system  performance  that  could  be  expected  for  different  angular  separations  of 
the  beams.  In  this  case,  performance  would  take  into  account  the  binaural  processing  ability  of 
the  human  listeners  as  well  as  the  information  being  presented  to  them  by  the  array .  Accurately 
measuring  human  perception  is  time  consuming  and  for  this  investigation,  equipment  was 
available  sporadically  so  some  of  our  measurements  are  scant. 

Throughout  this  report,  we  refer  to  both  the  angle  formed  by  the  two  beams  of  the  array 
or  to  the  direction  to  which  the  array  is  pointing  relative  to  the  source  of  a  signal.  Figure  1 
depicts  the  possibilities.  We  use  (a)  to  refer  to  the  angle  of  beam  separation  of  the  array  and  (0) 
to  denote  the  azimuth  of  the  axis  of  the  array  relative  to  an  arbitrary  zero.  In  addition,  we  use 
positive  values  for  azimuth  to  indicate  clockwise  rotation.  For  instance,  a  loudspeaker  is  located 
at  0°  in  Figure  1  and  the  array  is  pointing  at  approximately  +45°  relative  to  an  observer  facing  the 
loudspeaker. 
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Figure  1.  Angular  relations  of  array  and  loudspeaker. 


Later,  we  present  interaural  differences  measured  within  the  band  pass  of  the  array,  and  these 
were  calculated  as  the  arrival  time  or  level  for  the  left  ear  minus  that  for  the  right  ear.  Thus,  in 
this  report  positive  values  of  interaural  differences  will  correspond  to  positive  values  of  azimuth 
for  the  location  of  the  sound  source. 

PHYSICAL  ASSESSMENT 
Patterns  of  Gain 
The  Array 

The  output  of  the  LDLD  is  severely  band  limited  above  6  kHz  to  prevent  spatial 
aliasing,  and  the  spacing  of  the  ports  of  the  cardioid  microphones  created  an  attenuation  of -10 
dB  per  decade  below  1  kHz.  We  measured  sensitivity  within  the  band  pass  of  the  array  using 
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sinusoids  of  1 , 2, 3.2, 4,  and  6.4  kHz.  Initially,  we  chose  a  beam  separation  angle  of  a  =  45°. 

The  sensitivity  measurements  were  made  while  the  array  was  situated  in  a  comer  of  the  ARL 
anechoic  chamber  with  the  loudspeaker  about  3  m  away  and  at  the  same  height  as  the  array.  The 
array  was  placed  on  a  Bruel  and  Kjaer  Type  3921  turntable  that  rotated  approximately  4.5°  per 
second.  Continuous  signals  were  presented  while  the  array  was  rotated,  and  the  responses  of 
each  arm  of  the  array  were  recorded  simultaneously.  Signal  generation  and  presentation  were 
under  computer  control  as  were  the  recording  and  analysis  of  the  measurements.  The  rotating 
table  had  to  be  started  manually,  so  it  was  started  20°  to  30°  ahead  of  zero  (the  axis  of  the  array 
pointing  toward  the  loudspeaker),  and  the  individual  plots  of  the  -3  dB  sensitivity  contours  were 
later  rotated  to  align  with  the  origin  of  the  polar  plot.  The  result  was  that  noise  was  added  to 
these  traces  (typically  in  the  second  quadrant  between  100  and  130  )  that  was  produced  by  the 
experimenter  leaving  the  anechoic  chamber  after  activating  the  table;  it  is  not  part  of  the  response 
of  the  array.  The  measurements  made  with  a  =  45°  are  shown  in  Figures  2  through  6.  In  these 
plots,  as  well  as  in  later  ones,  the  largest  value  of  each  ear’s  magnitude  response  was  set  to  0  dB, 
and  the  rest  of  the  measurements  were  scaled  appropriately. 
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Clearly,  the  pattern  of  sensitivity  of  each  beam  is  a  fimction  of  the  frequency  of 
the  test  signal.  The  array  shows  a  wide  beam  pattern  for  low  frequency  signals  and  the  pattern 
narrows  as  signal  frequency  increases.  For  the  1000-Hz  signal,  the  beam  appears  almost  +45° 
about  the  axis  of  an  arm;  this  narrows  to  about  ±30°  for  signals  of  6.4  kHz.  In  addition,  side 
lobes  were  apparent  and  their  amplitude  was  a  function  of  the  frequency  of  the  test  signal.  At  1 
kHz,  the  array  showed  a  broad,  somewhat  lobular  pattern  of  sensitivity  outside  the  main  lobe, 
and  this  pattern  was  attenuated  about  20  dB  relative  to  the  main  lobe.  For  signals  of  2  and  3.2 
kHz,  the  side  lobes  were  more  pronounced  but  were  attenuated  by  about  25  to  30  dB.  At  4  and 
6.4  kHz,  the  lobe  pattern  was  still  more  complicated  but  again  was  attenuated  only  about  20  dB 
below  the  main  beam. 

We  expect  that  these  patterns  of  sensitivity  will  not  change  dramatically  as  the 
angular  separation  between  the  arms  is  changed.  There  may  be  interactions  between  the  arms  at 
small  values  of  a,  reflections  and  shadows  for  instance,  but  their  net  effect  is  probably  small. 

Based  on  our  measurement  of  the  sensitivity  patterns  of  the  arms  of  the  array  and 
on  data  supplied  by  Scanlon  and  Teimey  (1994)  of  ARL,  we  calculated  two  metrics,  the 
directivity  index  and  noise  sensitivity,  which  characterize  narrow  band  arrays.  Stadler  and 
Rabinowitz  (1993)  calculated  these  metrics  with  one-third  octave  band,  articulation  index  weights 
to  characterize  the  quality  of  speech  provided  by  different  configurations  of  arrays.  For  a  single 
arm  of  the  LDLD,  the  intelligibility- weighted  directivity  index  is  10.1  dB  and  its  intelligibility- 
weighted  noise  sensitivity  is  29.6  dB. 

The  Mannequin 

In  the  next  phase  of  this  study,  we  obtained  similar  data  for  a  representative 
human  head  and  torso.  We  chose  to  use  a  Knowles  Electronics  mannequin  for  acoustic  research 
(KEMAR)  because  there  are  reference  data  for  it  (Burkhard  &  Sachs,  1975)  and  we  could 
compare  our  measurements  to  those  made  at  other  laboratories.  Since  the  KEMAR  includes  both 
a  head  and  torso,  measurements  made  using  it  should  approximate  those  made  on  intact  human 
beings  more  accurately  than  recording  models  that  use  just  a  head  and  pinnae. 

These  measurements  were  made  in  the  same  fashion  as  those  made  using  the  array. 
The  KEMAR  was  placed  on  the  same  rotating  table,  roughly  in  the  same  position  within  the 
anechoic  room.  We  used  Primo  miniature  electret  microphones  to  record  the  KEMAR  response, 
and  these  were  mounted  at  the  center  of  the  blocked  ear  canal.  The  microphone  responses  were 
amplified  and  led  outside  the  anechoic  room  to  a  computer  that  controlled  the  signals  and  stored 


the  resulting  data.  Again,  the  responses  from  each  ear  were  recorded  and  analyzed  at  the  same 
time. 


For  test  frequencies  of  1, 2,  3.2, 4,  and  6.4  kHz,  gain  measurements  are  presented 
in  Figures  7  through  1 1 .  Compared  to  the  array  beam  patterns  of  Figures  2  through  6,  the 
sensitivity  patterns  of  the  KEMAR  model’s  ears  are  much  broader.  The  human  model  has 
significant  sensitivity  off  to  each  side  as  well  as  behind  the  head,  while  the  array  was  not 
designed  to  be  sensitive  to  signals  from  these  directions.  When  both  ears  are  considered,  the 
pattern  of  sensitivity  shown  by  KEMAR  is  quite  broad  in  the  frontal  hemisphere.  This 
observation  is  the  basis  of  our  conclusion  that  (a)  beam  separations  of  a  >  45  may  approximate 
the  patterns  of  interaural  differences  of  level  that  listeners  normally  expect,  and  (b)  perhaps  a 
parallel  arrangement  of  the  arms  should  be  tried.  Here,  we  would  recommend  a  separation  of  the 
arms  that  would  approximate  the  time  delay  produced  by  a  human  head. 


Figure  7.  Gain  plot  for  KEMAR  at  1  kHz. 
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Nornalized  at  85.8  dB 


KEMAR  at  3.2  kHz. 
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For  the  KEMAR  head,  sources  of  sound  located  off  to  a  side  will  produce 
interaural  intensity  differences  of  20  to  25  dB,  just  as  the  array  can  produce,  but  the  dynamics  of 
their  growth  as  0  increases  from  0°  to  90°  will  be  different.  This  is  probably  the  main  difference 
between  normal  unaided  listening  and  localization  with  the  binaural  array.  Interaural  level 
differences  produced  by  the  human  head  are  strongly  frequency  dependent  and,  like  time 
differences,  tend  to  grow  fastest  at  small  values  of  6. 

There  are  some  data  that  have  been  useful  for  anticipating  the  performance  of  a 
binaural  microphone  array.  Kuhn  (1977)  has  measured  arrival  times  at  each  ear  and  has 
expressed  these  as  interaural  time  differences  produced  by  the  head,  while  Shaw  and  Vaillancourt 
(1985)  have  measured  relative  signal  levels  at  each  ear  for  a  large  number  of  listeners.  Hafter  and 
DeMaio  (1975)  and  Hafter,  Dye,  Neutzel,  and  Aronow  (1977)  have  provided  measurements  of 
our  sensitivity  to  these  quantities,  and  in  addition,  Hafter,  Dye,  Wenzel,  and  Knetch  (1990)  have 
provided  a  recent  statement  of  our  understanding  of  how  interaural  time  and  level  cues  can 
interact. 

Interaural  Differences 

Our  last  physical  characterization  of  the  binaural  array  was  to  measure  the  magnitude  and 
group  delay  of  its  response  as  a  function  of  a.  This  also  was  done  while  the  array  was  situated 
in  a  comer  of  the  anechoic  room  with  the  loudspeaker  about  3  m  away.  The  arms  of  the  array 
were  separated  by  either  a  =  45°  or  0°  and  these  measurements  were  made  with  0  =  a.  Thus 
when  a  =  0  =  0°,  both  arms  were  parallel  facing  the  source,  and  when  a  =  0  =  45°,  they  were 
separated  by  45°  with  the  midline  axis  of  the  array  pointing  at  +45°.  Thus,  data  were  collected 
when  either  the  left  and  right  arm  of  the  array  were  nearer  the  source  of  the  signal.  In  addition,  as 
part  of  our  test  development,  these  measurements  were  made  on  the  KEMAR  mannequin 
configured  as  for  the  earlier  gain  measurements;  these  measurements  for  KEMAR  were  collected 
with  0  =  22.5°. 

For  these  delay  and  magnitude  measurements,  the  signal  was  a  linear  frequency  sweep,  a 
“chirp.”  A  linear  sweep  places  equal  energy  within  all  analysis  bands  making  the  resolution  of 
measurement  constant  across  the  band  of  interest.  Our  chirp  started  at  100  Hz,  swept  to  10 
kHz,  and  then  passed  through  a  16-kHz  anti-aliasing  filter  before  being  led  to  a  Bose  10-cm 
loudspeaker  (Model  1 180385 A).  The  resulting  signals  from  both  arms  of  the  array  were  Fourier 
transformed  to  4096-point  spectra  which  required  a  signal  rate  of  20  microseconds  per  sample 
point,  and  our  40-kHz  sampling  rate  produced  analysis  bins  39.06  Hz  wide.  One  hundred  chirps 
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were  averaged  for  the  group  delay  measurements  and  they  were  presented  at  a  rate  of  12  chirps 
per  second.  Group  delay  is  the  negative  first  derivative  of  the  unwrapped  phase  spectrum  and  to 
calculate  it  we  first  determined  the  phase  response.  We  unwrapped  phase  and  then  calculated 
both  the  group  delay  and  magnitude  for  the  left  and  right  ear  signals  for  the  sampling  intervals 
containing  the  1-,  2-,  3.2-,  4-,  and  6.4-kHz  components.  Usually,  there  was  not  enough  signal 
energy  for  stable  group  delay  estimates  for  the  highest  and  lowest  test  frequencies,  so  these  were 
excluded  from  consideration.  Our  measurements  of  group  delay  and  magnitude,  then,  were  made 
only  over  the  range  of  2  to  4  kHz  by  averaging  the  measurements  made  for  the  2-,  3.2-,  and  4- 
kHz  components.  These  measurements  for  both  the  array  and  the  mannequin  are  presented  in 

Table  1. 


Table  1 

Measured  Group  Delay  and  Magnitude  Differences 


Source  location  (6) 

-22.5°  0°  +22.5°  +45° 


Interaural  time  differences  in  seconds 
KEMAR  -.292  .098  +.319 


Array 

-.288 

-.292 

.043 

+.202 

+.187 

Interaural  magnitude  differences  in  dB 

KEMAR 

Array 

-16.17 

-6.48  -1.22 

-2.89 

+5.4 

Here,  problems  of  programming  and  the  limited  availability  of  equipment  kept  us  from 
collecting  complete  data.  In  the  table,  positive  values  for  interaural  differences  of  group  delay 
and  magnitude  response  reflect  the  left  ear’s  either  leading  in  time  or  having  the  greater  magnitude 
response.  For  the  measurements  of  differences  of  group  delay,  the  array  s  differences  for  6  —  45 
seem  close  to  those  for  the  KEMAR  head  for  0  =  22.5°.  The  measurements  on  the  array  seem 
asymmetrical  in  that  there  is  a  consistent  difference  of  more  than  80  microseconds  between  the 
measurements  made  at  0  =  -45°  and  those  made  at  0  =  +45°.  For  this  reason,  the  delay 
measurements  for  0  =  +45°  were  repeated.  An  inequity  is  also  apparent  in  the  measurements  of 
interaural  magnitude  differences.  There  is  almost  a  1 0-dB  difference  between  the  measurements 
made  for  similar  clockwise  and  counterclockwise  rotations. 
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Recall  that  the  array  was  situated  in  a  comer  of  ARL’s  anechoic  room  during  these 
measurements,  and  these  anomalies  may  be  attributable  to  reflections  fi-om  equipment  mounting 
hardware  or  from  the  array  itself  Unfortunately,  we  were  not  able  to  continue  this  work  to 
resolve  the  issue.  While  these  data  are  inconclusive,  we  feel  that  understanding  the  temporal  and 
intensive  relations  of  signals  passed  through  the  left  and  right  arms  of  the  array  tvill  provide 
useful  design  information,  and  this  work  should  continue.  Figures  2  through  6  provide  a  good 
display  of  the  magnitude  response  of  the  array  as  source  azimuth  is  varied.  Similar 
measurements  for  inter-arm  temporal  differences  are  still  needed.  Certainly,  the  transformation 
of  signals  passing  through  the  array  can  be  modeled,  and  in  lieu  of  collecting  more  measiarements 
about  the  array,  we  have  taken  this  route.  Our  modeling  of  the  action  of  the  array  is  presented 
later. 

BEHAVIORAL  ASSESSMENT 
The  Minimum  Audible  Angle 

To  characterize  the  localization  sensitivity  of  listeners  using  the  array,  we  determined  the 
minimum  audible  angle  for  listening  aided  by  the  array.  Measured  first  by  Mills  (1958),  the 
minimum  audible  angle  can  be  used  to  represent  the  sensitivity  to  spatial  separation  of  sound 
sources  as  a  function  of  the  relative  positions  of  the  source  and  listener.  At  each  azimuth,  two 
loudspeakers,  separated  by  a  fixed  distance,  are  used  to  present  soimds  sequentially  and  in  a  two- 
interval,  forced  choice  task,  the  listener  is  asked  to  judge  which  loudspeaker  presented  the  signal 
during  the  second  interval.  This  task  is  often  described  as  one  of  judging  movement  of  the  signal 
moving  from  the  left  position  to  the  right  one  or  the  reverse.  Typically,  fixed  loudspeaker 
separations  are  used  and  the  percentage  of  correct  responses  for  each  separation  is  measured  and 
then  threshold  is  estimated  by  interpolation.  This  was  the  method  used  in  our  first  behavioral 
experiment. 

Method 

The  minimum  audible  angle  measurements  were  made  in  ARL’s  hostile 
environment  simulator,  an  acoustical  test  facility  at  Aberdeen  Proving  Ground.  The  test  room  is 
17.3  m  by  13.4  m  and  6.7  m  high.  The  walls  are  Industrial  Acoustics  sound-deadening  panels 
that  produce  a  room  with  a  background  noise  level  of  38  dB  sound  pressure  level  (SPL),  or  17 
dB(A),  and  a  reverberation  decay  time  of  0.4  s.  The  binaural  array  was  placed  in  the  center  of  a 
circle  with  a  radius  of  5.5  m,  and  the  sound  sources  (a  pair  of  Bose  10-cm  loudspeakers.  Model 
1 180385A)  were  placed  successively  at  azimuths  of  0°,  15°,  30°,  45°,  or  60°.  The  loudspeakers 
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were  mounted  at  a  height  of  1.37  m  and  could  be  separated  by  horizontal  distances  of  12.5  to  84 
cm  (angular  separations  of  1.3°  to  7.7°  measured  from  the  center  of  the  loudspeaker  cones).  The 
signals  were  condensation  clicks  set  to  a  duration  of  0. 1  millisecond  and  were  calibrated  with  a 
spectrum  analyzer  and  sound  level  meter.  They  were  set  to  30  dB  SPL,  measured  at  the  position 
of  the  array,  and  the  two  loudspeakers  produced  signals  that  were  equal  (+0.2  dB)  at  all  octave 
bands  except  the  8-kHz  band  where  they  differed  by  not  more  than  ±0.5  dB.  For  this  and  the 
next  experiment,  the  gain  of  the  array  was  set  to  its  maximum  value-the  same  used  for  the 
measurements  shown  in  Figures  2  through  6.  Two  subjects  were  tested  and  they  listened  either 
normally  or  aided  by  the  binaural  array.  For  both  conditions,  the  listener  and  the  array  faced  0° 
when  the  signals  were  presented.  One  subject  had  considerable  experience  listening  to  binaural 
signals,  while  the  other  was  a  nai've  listener.  The  listeners  used  a  hand-held,  three-key  mouse 
controller  to  initiate  a  testing  session  as  well  as  to  indicate  where  the  signal  appeared. 

The  minimum  audible  angle  was  estimated  to  be  the  loudspeaker  separation  that 
produced  75%  correct  listening  performance.  Separations  for  the  loudspeakers  were  selected  to 
bracket  this  value,  and  the  minimum  audible  angle  for  a  given  azimuth  was  estimated  before 
testing  continued  at  another  azimuth.  When  the  listener  initiated  a  trial,  a  100-millisecond  quiet 
period  preceded  the  signals.  Then  the  clicks  were  presented,  separated  by  150  milliseconds,  and 
the  listeners  had  unlimited  time  to  respond  before  the  next  presentation.  A  response  started  the 
next  presentation  sequence.  Sets  of  50  trials  were  under  computer  control  and  the  listener  could 
initiate  the  set  with  the  same  hand-held  key  set  used  for  responding.  For  all  conditions  tested, 
the  estimates  of  percentage  correct  were  based  on  100  trials,  and  the  threshold  loudspeaker 
separation  was  estimated  based  on  at  least  two  such  measurements. 

Results 


The  measured  minimum  audible  angles  are  presented  in  Figure  12.  While  several 
values  of  a  were  tested  informally,  fairly  complete  data  were  obtained  for  a  =  45°  and  these  are 
shown  in  the  figure.  At  0  =  0°,  the  minimum  audible  angle  was  approximately  3.5°  for  both 
unaided  listening  and  for  aided  listening  through  the  array.  This  is  consistent  with  previous 
measiuements  and  is  to  be  expected  for  the  low  level  signals  used  here.  As  the  loudspeakers  were 
moved  away  from  the  midline,  the  threshold  separation  increased  and  for  unaided  listening,  had 
doubled  by  0  =  60°.  The  minimum  audible  angle  seemed  to  increase  less  rapidly  for  listening 
aided  by  the  binaural  array  as  the  threshold  separations  at  0  =  30°  and  45°  were  25%  and  50% 
lower  for  the  aided  condition.  Thus,  auditory  spatial  acuity  seemed  to  benefit  by  listening  over  a 
binaural  array,  but  there  was  a  problem  for  the  minimum  audible  angle  task  when  applied  to 
listening  aided  by  an  array. 
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Figure  12.  Minimum  audible  angle  as  a  function  of  loudspeaker  azimuth. 


Because  the  array  is  not  a  simple  microphone,  there  is  a  confounding  for  any  task  that 
employs  spatially  separated  loudspeakers.  When  a  wave  front  is  not  normal  to  the  long  axis  of 
the  array,  the  responses  of  the  microphones  are  spread  in  time,  that  is,  the  processing  delays  are 
set  to  sum  on-axis  wave  fronts  arriving  from  sources  at  an  azimuth  of  0°.  The  greatest  spread 
will  be  for  wave  fronts  arriving  from  directly  behind  the  array,  but  significant  temporal  spread 
will  occur  for  wave  fronts  hitting  the  array  broadside.  For  wave  fronts  from  sources  located  at 
90°  or  270°,  for  example,  the  time  between  the  individual  microphone  responses  will  be  the  speed 
of  sound  times  the  distance  separating  the  individual  microphones.  In  this  case,  the  total  spread 
of  the  responses  would  be  close  to  1  millisecond.  The  separation  of  individual  microphone 
response,  because  of  delays  built  into  the  array,  became  a  problem  for  the  minimum  audible  angle 
task  because  threshold  loudspeaker  separation  gets  large  as  the  sources  are  moved  away  from  the 
midline.  As  large  distances  between  the  loudspeakers  were  required  for  higher  values  of  6,  the 
wave  front  from  each  loudspeaker  arrived  at  the  array  at  angles  sufficiently  different  to  create 
perceptible  differences  in  the  clicks.  Differences  of  the  arrival  time  of  the  energy  of  the  impulses 
have  been  shown  to  be  a  reliable  cue  for  detection  (see  Patterson  and  Green,  1970;  Ronken,  1970; 
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or  Green,  1971).  At  the  largest  loudspeaker  separations,  possibly  by  6  =  45°  and  certainly  by  0 
=  60°,  the  individual  clicks  sounded  like  “tick”  and  “took”.  Thus,  the  array  produced  a 
characteristic  “coloring”  of  each  signal  that  depended  on  the  location  of  the  loudspeaker  that 
produced  it.  The  discrimination  of  these  clicks  could  easily  be  based  on  differences  of  phase 
(timbre)  rather  than  loudspeaker  location.  What  are  needed  now  are  psycho-acoustic  data  about 
how  the  ear  interprets  short-term  reverberation  created  in  multi-microphone  arrays.  Thus, 
processing  by  the  array  produced  qualitative  differences  in  the  clicks  that  could  obscure  the  loss 
of  angular  sensitivity  at  large  values  of  a  and  our  minimum  audible  angle  task  was  compromised. 

We  therefore  decided  to  use  a  task  requiring  only  one  loudspeaker. 

The  Limit  of  Lateralization 

Listening  with  this  binaural  array  is  similar  to  listening  with  headphones  in  that  the 
acoustic  waveform  is  not  affected  by  the  listener’s  pinnae,  head,  or  body.  In  addition,  dynamic 
changes  of  interaural  parameters  are  independent  of  muscular  feedback  from  movements  of  the 
head.  During  such  circumstances,  auditory  images  are  internalized  and  differences  of  interaural 
time  and  intensity  move  the  image  laterally  away  from  the  midline  and  toward  the  ear  they  favor. 
When  the  signals  from  each  arm  of  the  binaural  array  are  led  independently  to  the  two  ears,  the 
images  of  sounds  amplified  by  the  array  are  internalized  and  they  move  along  the  interaural  axis 
as  the  real  location  of  the  soimd  source  shifts  in  azimuth  relative  to  the  array .  Obtaining  usable 
information  about  the  azimuth  of  the  source  is  still  relatively  easy  if  the  listener  knows  where  the 
array  is  pointed,  and  with  minimal  practice,  the  position  of  the  internal  sound  image  can  be 
centered  to  guide  dynamic  pointing.  Because  the  range  of  lateral  positions  seemed  larger  than  the 
range  of  movement  of  the  array  (the  magnification  mentioned  earlier),  we  decided  to  relate  the 
extent  of  image  movement  to  the  angular  separation  (a)  of  the  arms  of  the  array. 

Imagine  that  the  array  were  rotated  clockwise  past  the  loudspeaker  shown  at  0*^  in  Figure 
1 .  Because  differences  of  both  interaural  time  and  level  would  favor  the  right  arm  first,  the  sound 
image  would  be  lateralized  to  the  right  ear  and  then  would  move  through  center  and  over  to  the 
left  ear  as  the  right  arm  swept  past  the  loudspeaker  and  interaural  time  and  intensity  differences 
favored  the  left  arm.  If  the  sweep  were  started  at  an  azimuth  distant  from  the  loudspeaker,  the 
listener  would  also  hear  the  growth  of  loudness  as  the  signal  entered  the  right  beam  just  as  the 
decline  of  intensity  would  be  apparent  when  the  signal  left  the  beam  of  the  left  arm  of  the  array . 
The  perception,  then,  is  of  an  increase  of  loudness  for  a  fully  lateralized  image;  then  a  range  of 
lateral  movement  is  possible  for  the  image,  and  finally,  it  is  lateralized  as  far  to  the  opposite  side 
as  is  possible  and  loudness  declines. 


The  array  was  pivoted  at  the  point  where  its  arms  were  connected,  allowing  movement  in 
the  horizontal  plane,  so  we  could  define  a  task  where  listeners  could  adjust  the  array  to  point  to 
the  lateral-most  position  for  image  movement.  For  instance,  there  is  almost  a  linear  relation 
between  the  perceived  lateral  location  of  a  sound  image  and  interaural  level  differences  as  great  as 
10  to  12  dB  (Yost  &  Hafter,  1988).  Beyond  that  point,  changes  of  lateral  position  grow  less 
rapidly  with  interaural  differences  of  signal  level.  Hafter  and  Kimball  (1980)  showed  binaural 
interaction  at  differences  of  interaural  level  above  30  dB,  but  the  associated  changes  of  lateral 
position  were  extremely  small.  With  the  binaural  array,  careful  listening  seemed  to  show  a  “dead 
band”  between  the  limit  of  lateral  movement  and  the  decline  of  loudness,  but  there  seemed  to  be 
consistency  when  listeners  judged  where  lateral  movement  stopped.  Thus,  for  a  wide  range  of 
angular  separations  of  the  arms  of  the  array,  we  could  measure  the  extent  of  lateral  movement  of 
the  internalized  image  as  the  azimuth  of  the  midline  of  the  array  when  the  image  reached  its  lateral 
limit.  This  we  did  for  a  =  15°  to  a  =  1 80°,  using  6,  the  azimuth  of  the  midline  of  the  array,  as 
the  measure  of  the  limit  of  image  movement. 

Method 

Two  subjects  were  asked  to  rotate  the  array  until  they  could  judge  a  lateralized 
position,  either  to  the  left  or  right,  where  image  movement  stopped.  The  starting  position  of  the 
array  (relative  to  the  loudspeaker)  was  varied  and  the  listeners  made  their  judgments  with  their 
eyes  closed.  These  measurements  were  also  made  at  the  hostile  environment  simulator  in  the 
same  space  that  was  used  for  the  minimum  audible  angle  study.  The  same  loudspeaker  and 
mounting  stand  was  used.  The  main  difference  was  that  the  array-to-loudspeaker  distance  was 
reduced  to  4.26  m.  During  these  measurements,  the  actual  location  of  the  loudspeaker  was  varied 
so  that  judgments  of  0  remained  within  a  pre-measured,  quarter  circle.  Typically,  the 
loudspeaker  was  located  between  +30°  and -+90°  relative  to  an  arbitrary  zero  point,  and  starting 
positions  for  the  array  varied  between  -30°  to  +  120°.  Listeners  were  asked  to  rotate  the  array 
until  the  auditory  image  remained  stationary  in  its  right-most  position,  that  is,  they  moved  the 
array  to  place  the  loudspeaker  near  the  right  arm  of  the  array  at  the  position  where  image 
movement  stopped.  When  the  listener  judged  an  azimuth  for  the  limit  of  lateral  image  movement, 
0  was  estimated  by  sighting  down  an  arm  of  the  array  while  a  confederate  located  a  marker  along 
the  pre-measured  quarter-circle.  We  calculated  0  from  the  position  of  the  marker  and  then 
adjusted  it  by  ±cx/2,  depending  upon  which  arm  was  chosen  for  the  sighting.  These 
measurements  of  0  as  a  ftmction  of  a  are  presented  in  Figure  13. 
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Figure  13.  Judgments  of  the  limit  of  lateral  image  movement. 


Results 

Obtained  results  support  the  notion  that  the  angle  of  beam  separation  directly 
affects  sound  lateralization  when  listening  with  the  array.  Measurements  of  sensitivity,  such  as 
the  minimum  audible  angle,  may  not  show  dramatic  improvements  because  sensitivity  to 
interaural  differences  is  maximal  for  sounds  arriving  from  azimuths  close  to  the  midline.  Here,  the 
dynamic  range  for  image  lateralization  increased  with  a  up  to  90°.  When  an  a  =  180  condition 
was  tried,  the  range  decreased  to  that  seen  when  ot  was  between  1 5°  and  30  .  Linear  fimctions 
(shown  in  Figure  13)  were  fitted  to  the  data  for  a  =  15°  to  a  =  90°,  and  these  seem  to  support 
the  argument  that  the  listeners  were  selecting  a  constant  proportion  of  a  as  the  basis  of  their 
lateral  position  judgments.  That  is,  a  function  that  relates  lateral  image  position  to  0  +  ka  (in 
which  k  is  >  0  and  <  1 .0)  would  seem  to  be  an  appropriate  model  for  these  data.  There  are  slight 
differences  of  slope  to  the  fitted  functions,  but  these  measurements  are  few  and  noisy. 
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We  suspect  that  wide  separation  of  the  beams,  on  the  order  of  a  =  45°  to  60°,  will 
make  the  spatial  sensitivity  pattern  of  the  array  more  like  that  of  the  human  head,  at  least  in  the 
forward  direction.  Unfortunately,  there  are  no  measurements  similar  to  these  for  unaided 
listening.  Measurements  of  the  lateral  position  of  a  sound  image  for  sounds  containing  the 
temporal  and  level  differences  characteristic  of  sounds  presented  from  different  source  azimuths 
would  provide  a  baseline  for  comparison,  but  such  data  are  not  available.  Given  that  the  array 
will  be  used  to  point  toward  a  source  of  acoustic  energy,  the  broader  the  range  of  source  locations 
that  can  be  differentiated,  the  better  will  be  pointing  and  tracking  by  listeners. 

The  Accuracy  of  Pointing 

Last,  we  examined  the  effect  that  angular  separation  of  the  array  arms  has  on  the 
localization  of  sounds  when  the  task  was  to  move  the  array  back  and  forth  in  order  to  “center” 
the  internalized  image  of  the  source  of  sound.  Here,  the  interest  was  in  making  listening  through 
the  array  similar  to  normal  hearing,  so  the  total  gain  of  the  array  was  set  to  assure  equal  loudness 
for  sounds  delivered  to  aided  and  unaided  ears.  In  addition,  the  target  sound  was  presented  with 
a  continuous  backgroimd  masking  noise.  To  keep  the  task  of  listening  through  the  array  similar 
to  normal  listening,  the  array  was  mounted  on  the  subject’s  head  and  the  task  was  to  point 
toward  the  source  of  sound.  Thus,  listeners  could  engage  in  whatever  scanning  strategies  they 
might  choose  in  order  to  orient  toward  the  loudspeaker. 

Subjects 

Twelve  subjects  served  as  listeners.  They  were  recruited  from  a  local  community 
college  and  had  normal  hearing;  their  audiograms  were  within  15  dB  of  audiometric  zero  and  they 
had  no  history  of  auditory  abnormality. 

Signals 


The  test  signal  was  a  digitized  recording  of  the  closure  of  an  AK-47  rifle  bolt  that 
had  a  duration  of  0.84  second.  The  sounds  were  stored  in  and  played  from  the  memory  of  an 
IBM/486  computer  under  the  control  of  a  Tucker  Davis  Technology  (System  II)  signal¬ 
processing  system.  The  bolt  closures  were  set  to  one  of  three  levels  (65,  75,  or  85  dB  SPL)  and 
presented  once  every  2  seconds  until  the  listener  indicated  a  judgment.  The  signals  were 
delivered  to  ER-3A  earphones  (Etymotic  Research)  and  led  through  ERl-14  foam  plugs  inserted 
in  the  ear  eanals.  The  continuous  pink  noise  masker  was  set  to  80  dB(A).  All  levels  were 


23 


adjusted  with  a  precision  of  0.5  dB  using  peak  (for  the  bolt  click)  and  root  mean  square  (rms)  (for 
the  noise)  readings. 

Method 

To  equate  listening  with  and  without  the  array,  the  gain  of  the  array  channels  was 
set  using  a  loudness  balancing  procedure.  Three  subjects  listened  to  the  target  sound  with  aided 
and  unaided  ears,  and  the  gain  of  the  array  was  adjusted  to  produce  equal  loudness  for  both 
conditions.  Three  array  separations  (a  =  30°,  45°,  and  60°)  and  an  unaided  listening  (without  the 
array)  condition  were  combined  with  three  signal-to-noise  ratios  (SNRs)  (+5,  -5,  and  -15  dB)  to 
form  a  factorial  set  of  12  conditions  that  were  presented  four  times.  The  12  conditions  of  array 
separation  and  SNR  were  completely  counterbalanced  across  listeners. 

Procedure 

Testing  was  conducted  in  ARL’s  hostile  environment  simulator.  Listeners  sat  in  a 
swivel  chair  approximately  at  the  center  of  the  hostile  environment  simulation  room,  facing  a 
loudspeaker  mounted  1 .5  m  high  at  a  distance  of  4.88  m  and  situated  at  the  point  labeled  0°  in 
Figure  1 .  A  second  loudspeaker  moxmted  about  46  cm  directly  above  the  subject’s  head  was  used 
to  present  the  masking  noise.  The  binaural  array  and  an  eye-safe  laser  were  mounted  on  an 
adjustable  headband.  The  array  was  mounted  horizontally  and  the  laser  was  oriented  along  the 
axis  of  the  array,  but  its  beam  was  depressed  to  aim  at  a  grid  on  the  floor  in  front  of  the  listener. 

A  quarter  circle  with  radial  lines  marked  every  2°  was  drawn  with  the  chair  at  its  center.  For  each 
judgment,  the  position  of  the  laser  beam  on  the  grid  on  the  floor  indicated  the  direction  of  the 
array.  With  this  method,  angles  could  be  measured  to  a  precision  of  0.5°. 

During  data  collection,  the  subjects  were  blindfolded.  A  trial  consisted  of  activating  the 
masker,  rotating  the  chair  to  disorient  the  subjects,  activating  the  test  signal,  and  allowing  the 
listener  to  rotate  the  chair  (approximately  back  to  0°)  and  point  the  array  at  the  signal 
loudspeaker.  A  hand-held  switch  was  used  by  the  listener  to  activate  the  eye-safe  laser  indicator. 
The  azimuth  of  the  array  was  then  estimated  from  the  position  of  the  laser  beam  on  the  calibrated 
quarter  circle.  The  angular  distance  between  the  actual  and  estimated  loudspeaker  positions  was 
used  as  the  measure  of  localization  precision.  Each  subject’s  set  of  48  trials  was  presented 
within  a  single  testing  session. 
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Results 


A  repeated  measures  analysis  of  variance  employing  Greenhouse-Geisser 
corrections  was  used  to  test  the  effects  of  array  separation,  SNR,  and  repetition  of  the  errors  of 
pointing  at  the  signal  loudspeaker. 

Only  the  array  angle  variable  showed  a  significant  effect  (F  (3,  72)  =  6.26,  p  < 
0.01)  on  localization  accuracy.  Post  hoc  comparisons  showed  that  listening  during  the  normal 
listening  condition  was  better  than  the  pooled  array  conditions  (£  (1, 24)  =  13.14,  p  <  0.01),  that 
pointing  with  a  =  60°  was  more  accurate  than  with  the  pooled  a  =  30°  and  45°  conditions  (F  (1, 
24)  =  7.33,  p  <  0.05),  and  that  pointing  with  a  =  60°  was  better  than  that  during  the  a  =  45° 
condition  (F  (1, 24)  =  6.03,  p  <  0.05).  There  was  no  statistical  difference  between  unaided 
listening  and  listening  with  the  array  with  a  =  60°. 

Figure  14  displays  the  average  of  the  unsigned  errors  of  localization  judgment  for  unaided 
listening  (with  bare  ears)  and  for  the  aided  conditions  of  a  =  30°,  45°,  and  60°.  Repetitions  were 
not  a  significant  source  of  variance,  so  for  the  figure,  data  were  averaged  across  listeners  and 
repetitions.  Each  data  point  therefore  represents  an  average  of  48  judgments. 


Un-aided  30  45  60 

Listening  Array  Separation  Angles  in  Degrees 


Listening  Conditions 

Figure  14.  Pointing  accuracy  for  aided  and  unaided  conditions. 
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DISCUSSION 


The  results  shown  in  Figure  14  indicate  that  pointing  accuracy  was  best  with  an  array 
separation  of  a  =  60°.  Because  the  array  was  movable,  movement  produced  dynamic  changes  in 
interaural  cues  much  as  those  produced  by  head  movements  during  imaided  listening.  This  made 
it  possible  for  listeners  to  center  an  internalized  sound  image.  In  addition,  this  was  probably 
why  localization  performance  was  not  dramatically  degraded  by  the  binaural  array  and  why 
signal  level  had  a  minimal  effect  except  possibly  at  a  =  60°.  Unaided  pointing  error  averaged 
about  4°;  about  the  same  as  did  error  for  a  =  60°  when  the  level  of  the  signal  level  was  5  dB 
above  that  of  the  noise.  These  values  seem  large  as  localization  error  without  noise  can  be  as  low 
as  1°  for  sources  near  the  midline  (Mills,  1958).  However,  Good  and  Gilkey  (1996)  have  shown 
that  sound  localization  (with  the  head  stationary)  becomes  increasingly  difficult  as  the  SNR  goes 
below  +5  dB.  Relative  to  unaided  listening,  localization  error  roughly  doubled  when  the  array 
was  used,  but  we  suspect  that  SNR  had  little  effect  because  the  listeners’  task  was  to  center  an 
image  rather  than  to  judge  its  absolute  position.  As  they  rotated  the  array,  listeners  could 
observe  changes  of  the  direction  of  image  movement  and  could  adjust  the  position  of  the  array  to 
be  close  to  0  =  0°.  The  poorer  adjustment  performance  for  a  =  30°  and  45°  indicates  that 
inadequate  spatial  resolving  power  is  provided  by  those  angular  separations  of  the  array. 

Modeling  the  Array 

When  we  were  unable  to  finish  our  physical  characterization  of  the  array,  we  felt  that 
knowledge  about  the  summation  of  the  responses  of  the  individual  microphones  in  the  array  was 
going  to  provide  insight  about  the  information  it  provides  listeners.  In  particular,  we  felt  it  would 
be  useful  to  depict  the  interaural  temporal  and  level  responses  produced  by  the  array  when  sound 
wave  fronts  arrived  from  different  azimuths.  The  processing  delays  for  the  array  are  set  to  sum 
on-axis  wave  fronts  arriving  from  sources  located  at  an  azimuth  of  0°.  When  signals  arrive  from 
other  directions,  these  delays  are  incorrect  by  a  factor  of 

sd  (cos  [0  -  a]  -  cos  a)  (1) 

in  which  s  is  the  speed  of  sound  and  d  is  the  distance  between  adjacent  microphones.  To  provide 
some  insight  about  cheinges  in  the  information  listeners  have  for  their  localization  judgments,  we 
have  modeled  the  array’s  processing  (much  as  did  Liu  &  Sideman,  1996)  as  passing  a  definable 
waveform  over  two  independent  linear  arrays  with  adjustable  length,  microphone  placement,  and 
angular  separation.  Assuming  that  ideal  condensation  pulses  are  presented  over  a  signal 
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loudspeaker  located  2.0  m  in  the  right  far  field,  the  output  waveform  of  each  arm  of  the  array  was 
calculated  and  Fourier  transformed  so  that  the  magnitude  spectrum  and  group  delay  of  each  arm 
could  be  determined.  We  compared  the  magnitude  and  group  delay  response  of  each  arm  of  the 
array  to  establish  the  interaural  differences  of  time  and  level  that  are  contained  in  signals  from  the 
array.  For  this  modeling,  idealized  stimulation  and  recording  conditions  were  assumed. 

Figure  15  shows  interaural  differences  of  magnitude  and  group  delay  for  6  =  0°  to  60°. 

For  the  purpose  of  generating  the  figure,  values  for  magnitude  and  group  delay  for  each  arm  were 
averaged  between  0.5  and  6.0  kHz  and  then  interaural  differences  were  formed  to  produce 
positive  values  for  both  magnitude  and  group  delay.  For  the  three  values  of  a  used  in  the 
pointing  study,  this  produced  model  results  that  correspond  well  to  selected  measurements  of 
magnitude  and  group  delay  shown  in  Table  1 . 

Figure  15  shows  interaural  differences  that  for  all  settings  of  a,  grow  as  the  signal 
loudspeaker  moves  away  from  the  axis  of  the  array.  The  growth  of  response  magnitude  remains 
orderly  until  6  =  30°  to  40°,  then  the  interaural  differences  of  magnitude  begin  to  decline.  Given 
that  the  sound  source  is  to  the  right  of  the  array,  the  response  of  the  right  arm  leads  that  of  the 
left  arm  by  a  degrees,  and  interaural  magnitude  differences  favor  localization  to  the  right  of  the 
midline  until  0  =  70°  to  90°.  Beyond  that  point,  differences  of  magnitude  favor  one  arm  or  the 
other  depending  upon  the  interaction  of  the  wave  front  with  each  arm.  At  0  =  1 80°,  the 
responses  of  the  arms  are  again  equal. 

Figure  15  includes  measurements  of  the  interaural  level  difference  (ILD)  for  a  3.0-kHz 
signal  taken  from  Shaw  and  Vaillancourt  (1985).  For  all  values  of  a,  ILDs  develop  faster  for  the 
array  than  for  the  human  head  as  a  signal  source  moves  away  from  the  midline.  This  causes  the 
array  to  act  as  an  auditory  lens,  expanding  auditory  space  about  the  midline.  Thus,  a  binaural 
array  with  arms  that  are  not  parallel  pits  the  high  gain  provided  by  each  arm  against  the  likelihood 
of  distorted  localization  judgments.  Shinn-Cunningham,  Durlach,  and  Held  (1996)  accomplished 
a  similar  expansion  by  distorting  the  relation  between  source  azimuth  and  the  particular  head- 
related  transfer  function  selected  for  listening  to  that  source  over  headphones.  Interaural 
differences  of  both  time  and  level  produced  this  distortion  because  it  was  created  using 
measurements  made  on  a  human  head.  In  the  binaural  array,  distortion  is  produced  by  the 
greater-than-normal  gain  associated  with  each  arm. 
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Figure  15.  Calculated  interaural  differences  of  magnitude  and  group  delay. 


In  contrast,  the  differences  of  group  delay  are  more  orderly;  they  reach  a  maximum  by  0 
=  80°  to  100°  and  then  decline  to  no  interaural  differences  by  0  =  180°.  The  delay  for  each  arm, 
of  course,  continues  to  increase  to  reach  a  maximum  of  more  than  a  millisecond  at  0  =  1 80  .  The 
group  delay  produced  by  passing  sound  through  the  array  is  less  than  that  produced  by  the 
human  head  (Kuhn,  1977),  although  the  growth  of  group  delay  for  a  =  60°  closely  matches  that 
of  the  head  imtil  the  aamuth  of  the  source  reaches  20°  to  25°. 
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Figure  15  does  not  show  the  complicated  behavior  of  magnitude  and  group  delay  beyond 
0  =  60°  because  by  that  point,  the  signal  loudspeaker  is  out  of  the  sensitivity  beam  of  either  arm 
and  input  is  attenuated  relative  to  sources  within  the  beams.  Our  listeners  manipulated  the 
binaural  array  to  “capture”  the  signal  loudspeaker  within  the  range  of  0  =  +40°  where  the  cues 
provided  by  the  array  are  consistent  with  changes  produced  by  the  human  head.  We  feel  that  the 
orderliness  of  the  magnitude  and  delay  responses  wdthin  that  range  allows  listeners  to  null  image 
movement  and  point  the  array  at  an  acoustic  target.  Both  magnitude  and  group  delay  show 
orderly  growth  with  values  of  a,  and  the  array  separation  that  produced  the  most  rapid  growth 
of  interaural  differences  (a  =  60°)  also  supported  the  most  accurate  localization  performance. 

CONCLUDING  THOUGHTS 

From  all  these  studies,  it  is  clear  that  localization  performance  is  best  with  the  wider  array 
separation  angles.  Narrow  separation  angles  (of  45°  or  less)  were  considered  because  of  physical 
constraints  on  the  width  of  the  present  design  of  the  LDLD  when  it  is  mounted  on  a  rifle.  These 
constraints  can  be  relaxed  somewhat  when  digital  beam-forming  techniques  are  used.  In  our 
study,  pointing  with  a  =  60°  was  superior  to  any  other  array  separation  tested  and  was 
indistinguishable  from  unaided  listening.  In  addition,  our  feeling  is  that  the  combined  beam 
pattern  of  the  array  should  approximate  that  for  the  human  head  for  at  least  +40°  about  the 
midline.  There  are  few  data  to  support  the  design  of  arrays  for  binaural  listening,  and  imtil  more 
data  are  available,  good  advice  is  that  a  binaural  array  should  provide  patterns  of  interaural 
differences  close  to  those  listeners  expect.  The  poor  localization  performance  seen  with  narrow 
beam  separations  is  probably  attributable  to  the  array’s  distorting  the  interaural  intensity 
information  used  for  the  judgments. 

It  seems  clear  that  two  aspects  of  the  potential  use  of  binaural  arrays  should  be  part  of 
the  assessment  of  future  designs.  First,  because  of  the  probable  trade  of  array  gain  and  the 
accuracy  of  static  localization,  future  work  should  measure  the  accuracy  of  localization  with  the 
array  stationary.  Here,  the  array  should  be  stationary  and  pointed  at  0°,  and  the  listener  would 
indicate  which  of  several  loudspeakers  presented  a  brief  signal.  The  loudspeakers  should  be 
located  on  an  arc  equidistant  from  the  array.  This  arc  should  span  a  range  of  source  azimuths  at 
least  to  +40°  about  the  midline.  Our  predictions  are  that  (a)  best  localization  performance  would 
be  associated  with  a  loudspeaker  located  at  the  midline,  (b)  localization  accuracy  should  decline 
rapidly  as  the  signal  location  departs  from  the  midline,  and  then  (c)  accuracy  should  slowly 
improve  when  the  signal  is  presented  from  the  more  lateral  loudspeakers.  Such  measurements 
would  show  the  distortion  of  azimuth  judgments  caused  by  a  particular  design. 
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Second,  an  assessment  of  localization  with  a  binaural  array  should  include  a  dynamic 
pointing  task,  similar  to  the  one  in  our  third  behavioral  experiment.  Here,  our  predictions  follow 
the  findings  of  our  third  study,  that  an  array  separation  angle  should  be  associated  with  best 
pointing.  Too  narrow  a  separation  of  the  arms  of  the  array  would  hamper  accurate  pointing  as 
would  too  wide  a  separation.  In  addition,  wide  separations  will  probably  be  impractical  for 
arrays  intended  to  be  mounted  on  portable  equipment,  such  as  rifles. 

Both  of  the  tasks  just  described  should  be  used  to  assess  particular  designs  for  binaural 
arrays.  We  have  emphasized  using  the  LDLD  as  a  scanning  device  for  locating  continuous  or  long 
duration  signals.  In  the  real  world,  very  brief  sounds  often  carry  important  information  as  well, 
and  soldiers  may  need  to  accurately  locate  their  sources.  From  our  discussion  so  far,  it  seems 
clear  that  arrays  can  be  designed  that  will  favor  one  of  these  tasks  over  the  other,  and  so  both 
measurements  are  needed  for  system  evaluation. 

Our  results  can  be  characterized  as  showing  that  while  our  particular  binaural  array  did 
not  improve  localization  performance  beyond  that  obtained  with  unaided  listening,  it  did  provide 
adequate  localization  ability  for  signals  that  are  normally  barely  detectable.  Perhaps  more 
important,  it  can  be  used  to  enhance  sound  localization  in  situations  where  normal  sound 
localization  is  impossible.  Enclosures  and  helmets  or  other  protective  gear  hamper  normal 
listening,  and  a  binaural  array  will  not  only  improve  the  SNR  along  a  particular  axis,  but  it  will 
also  enhance  the  localization  of  sounds  when  conditions  normally  make  this  difficult. 

The  binaural  array  is  intended  for  use  both  in  quiet  (for  surveillance)  and  in  noise  (such  as 
in  moving  vehicles).  From  the  results  presented  here  and  from  comments  collected  from  soldiers 
participating  in  this  testing,  it  is  recommended  that  when  an  array  is  used  in  the  quiet,  the  signals 
should  be  led  to  the  ears  through  an  open  mold,  and  that  when  it  is  used  in  noise,  the  signals 
should  be  received  through  either  a  closed  mold  or  an  earplug.  The  open  mold  allows  soldiers  to 
hear  relatively  quiet  ambient  sounds  about  them  in  addition  to  hearing  target  sounds  from  the 
array.  The  advantage  of  a  closed  mold  or  earplug  is  that  it  would  attenuate  high  level  ambient 
noise  while  maintaining  the  levels  of  target  sounds  entering  through  the  array. 

Last,  we  have  two  suggestions.  The  current  design  of  the  LDLD  has  the  arms  hinged  at 
one  end  so  that  the  array’s  angular  coverage  can  be  varied.  A  design  that  should  be  tried  is  to 
make  the  arms  of  the  array  parallel  and  separated  by  enough  distance  to  reproduce  the  interaural 
time  differences  produced  by  the  head.  The  angular  separation  of  the  two  high-gain  arms  of  the 
LDLD  creates  a  system  optimized  for  finding  the  null  point  where  the  inputs  to  the  two  ears 
balance,  but  the  level  differences  grow  more  rapidly  than  they  normally  do  for  unaided  listening. 
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A  parallel  configuration  could  (a)  provide  appropriate  interaural  delays,  (b)  maintain  system  gain, 
and  (c)  allow  interaural  level  differences  to  grow  more  slowly. 

Our  other  suggestion  is  that  related  applications  of  binaural  array  technology  should  be 
monitored.  For  example,  a  recent  spate  of  papers  (Soede,  Berkhout,  &  Bilsen,  1993;  Soede, 
Bilsen,  &  Berkhout,  1993;  Kates,  1993;  Stadler  &  Rabinowitz,  1993)  described  the  start  of  work 
to  develop  binaural  hearing  aids  based  on  array  technology,  and  this  work  has  led  to  innovative 
ways  to  trade  the  ability  to  localize  a  source  of  sound  with  other  design  requirements.  Welker, 
Greenberg,  Desloge,  and  Zurek  (to  be  published)  and  Desloge,  Rabinowitz,  and  Zurek  (to  be 
published)  have  both  tried  to  improve  the  perception  of  speech  heard  over  linear-array  hearing 
aids  while  trying  to  maintain  the  listener’s  appreciation  of  the  spatial  environment.  It  is  likely 
that  this  work  will  be  applicable  to  other  problems  of  binaural  listening  aided  by  linear 
microphone  arrays. 
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